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In the Claims 



] ♦ (Currently Amended) A method of performing a domain-specific metasearch and 
obtaining search results therefrom, said method comprising the steps of: 

providing a metasearch engine capable of accessing generic, web-based search engines and 
domain-relevant search engines; 

receiving a query inputted by a user to the metasearch engine and searching for documents on 
a selected set of said generic, web-based search engines and domain-relevant search engines which 
are relevant to the query; 

fetching raw data search results in the form of text documents from each member of the 
selected set; 

displaying the raw data on a user interface; 

supplying the raw data to a data mining module, wherein the data mining module forms 
clusters of related documents according to an unsupervised clustering procedure and wherein the 
data mining module, u po n receiving the raw data, processes the raw data. in d^CTdemlyjpf^ 
unsupervised clustering procedure, and prepares a single list of all of the do cuments, after 




displaying the clusters of related documents on the user interface. 

2> (Original) The method of claim 1, wherein the unsupervised clustering procedure 
performed by the data mining module employs a group-avcragc-linkage technique to determine 
relative distances between documents. 

3. (Original) The method of claim 2, wherein the group-avcragc-linkage technique employs 
the following algorithm for determining a proximity score that defines the relative distances between 
documents: 



where T\ is a term in document i; 
Tj is a term in document j; 

N(TuTj) is the number of co-occurring terms that documents i and j have in common; 

N(Ti) is the number of terms found in document i; and 
N(Tj)is the number of terms in document j. 

4. (Cancelled) 
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5. (Currently Amended) The method of claim 1-4, wherein the data mining module assigns 
simple relevance scores to the documents prepared in the single list, based upon a frequency of terms 
from the query that appear within each of the documents. 

6. (Original) The method of claim 5, wherein the documents arc listed in the single list in an 
order ranging from a highest of the simple relevance scores to a lowest of the simple relevance 
scores. 

7. (Original) The method of claim 1 , further comprising the step of providing customized 
stop word lists to be used with regard to the generic, web-based search engines and domain-relevant 
search engines, wherein the data mining module references the stop word lists to strip stop words 
from documents associated with a respective generic, web-based engine or domain-relevant engine 
for which the particular stop word list being referred to has been customised, prior to determining the 
frequency of terms from the query that appear within each of the documents and computing a 
similarity score between results. 

8. (Original) The melhod of claim 7, wherein the step of providing customized stop word 
lists comprises providing predefined customized stop word lists. 

9. (Original) The method of claim 7, wheniin the step or providing customized slop word 
lists comprises automatically generating stop word lists which arc prepared and customized for each 
query. 

10. (Original) The method of claim 5, further comprising displaying the single list on the 
user interface. 

1 1 . (Original) The method of claim 1, wherein the data mining module* upon receiving the 
raw data, processes the raw data, independently of the unsupervised clustering procedure, and 
categorizes the documents so that each document is assigned to one of a predefined number of 
categories. 
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12. (Original) The method of claim 1 1, further comprising providing a list of words for each 
of the categories wherein the words in each list arc particular to the respective category, and wherein 
the data mining module compares the words in a particular list to a document to be characterized lo 
determine whether the document is classified in that particular category. 

13. (Original) The method of claim 12, wherein the step of providing a list of words for each 
of the categories comprises providing predefined lists, 

14. (Original) The method of claim 1 2 r wherein the step of providing a list of words for each 
of the categories comprises automatically generating the word lists which are prepared from a set of 
training documents. 

1 5. (Original) The method of claim 14, wherein each word automatically selected for the 
generation of the word lists is identified based on a function computed from a frequency of 
occurrence of the word in the particular category for which it is selected, relative to a frequency of 
occurrence of the word in the other existing categories, 

16. (Original) The method of claim 12, wherein the step of providing a list of words for each 
of the categories comprises automatically generating the word lists which are prepared by 
incremental training using previously selected lists of words and corresponding categories, as well as 
user feedback regarding the categorization of at least one of the documents. 

17. (Original) The method of claim 1 1 , wherein, upon completion of categorization of the 
documents, the documents arc displayed in a categorized format to the user interface. 

18. (Original) The method of claim 1, wherein the metasearch engine is further capable of 
acccssiug in-housc, proprietary databases and any other informational databases that can be wrapped 
in a CXiI-bascd web application server, 

1 9. (Original) The method of claim 1 , further comprising the steps of: 

displaying a list of the generic search engines and domain-relevant search engines on the user 
interface which arc available for searching; and 

receiving a selection of all or part of the list from the user for directing the query thereto. 
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20. (Original) The method of claim 19, further comprising providing a context menu by 
which a user can select a group of search sites or engines by selecting a single context entry. 

2t. (Original) The method of claim 20, wherein the context menu includes at least one of the 
presets selected from the group consisting of a publications preset which selects more than one 
publications site, a sequences preset which selects more than one sequences site, a generic, web- 
based search engines preset which selects more than one generic, web-based search engine, a protein 
structure databases preset which selects more than one protein structure database, and a pathway 
information databases preset which selects more than one pathway information database. 

22, (Original) The method of claim 1, wherein the documents consist of text-based data. 

23. (Original) The method of claim 1, further comprising the steps of 
storing at least one of the raw data and the clusters; 

performing the steps of claim 1 to accomplish an additional search and data mining 
procedure; 

storing at least one of the raw data and the clusters obtained from the additional search and 
data mining procedure; 

receiving a sub-query inputted by a user to the metasearch engine and searching for 
documents from the data stored by the storing steps performed in regard to previous searches; which 
arc relevant to the sub-query; 

fetching raw daLa sub-query search results in the form of text documents from the stored data; 

displaying the raw data sub-query search results on a user interface; 

supplying the raw data sub-query search results to the data mining module, wherein the data 
mining module Tonus clusters of related documents according to an unsupervised clustering 
procedure; and 

displaying the clusters of related documents resultant from the sub-query search on the user 
interface, 

24 > (Original) The method of claim 1 , further comprising; 

providing a browser including a relevance feedback mechanism; 

analyzing the documents as they are browsed by a user on the user interface; and 
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generating a relevance weighting factor based upon observations resulting from the 
analyzing. 

25. (Original) The method of claim 24, wherein the relevance weighting factor is applicable 
to a particular document having been browsed during the analyzing. 

26. (Original) The method of claim 24, wherein the relevance weighting factor is applicable 
to a site or search engine from which a particular document having been browsed during the 
analyzing was fetched. 

27. (Original) The method of claim 24, wherein the relevance weighting factor is applicable 
to a cluster in which a particular document having been browsed during the analyzing is grouped. 

28. (Original) The method of claim 24, wherein the relevance weighting factor is applicable 
to a category in which a particular document having been browsed during the analyzing is 
categorized . 

29. (Original) The method of claim i, further comprising: 
storing at least one of the raw data and the clusters; 

performing the steps of claim 1 to accomplish an additional search and data mining 
procedure; 

providing a browser including a relevance feedback mechanism; 

analyzing the documents displayed from the additional search as they are browsed by a user 
on the user interface, wherein the analyzing includes comparing the documents being browsed with 
the stored data; and 

generating a relevance weighting factor based upon observations resulting from the 
analyzing. 

30. (Currently Amended) A method of performing a life science-specific d omain- 
specific mctascntch and obtaining search results therefrom, the method comprising the steps 
Of: 

providing a mctasearch engine capable of accessing generic, web-based search engines, 
publication sites, sequences sites, protein structure databases and pathway information databases; 
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receiving a query inputted by a user to the metasearch engine and searching for documents on 
a selected set of the generic, web-based search engines, publications sites, sequences sites, protein 
structure databases and pathway information databases which are relevant to the query; 

fetching raw data search results in the form of text documents from each member of the 
selected set; 

displaying the raw data search results on a user interface; 

supplying the raw data to a data mining module optjmized spec i Really for t he li fe sciences, 
wherein the data mining module prepares a single list of all of the documents, after eliminating 
documents not reachable via the web, and assigns simple relevance scores to the documents 
prepared in the single list; forms clusters of related documents according to an unsupervised 
clustering procedure; and categorizes the documents so that each document is assigned to one of a 
predefined number of categories; and 

displaying the documents in a format defined by the single list, in a format defined by the 
clusters, and in a format defined by the categories on the user interface so that a user can choose to 
browse the documents according to the list format, cluster format or categories format. 

3L (Currently Amended) A method of performing a domain-specific metasearch and 
obtaining search results therefrom, said method comprising the steps of: 

providing a metasearch engine capable of accessing generic, web-bascd search engines and 
domain-relevant search engines; 

receiving a query inputted by a user to the metasearch engine and searching for documents on 
a selected set of said generic, web-based search engines and domain-relevant search engines which 
are relevant to the query; 

fetching raw data search results in the form of text documents from each member of the 
selected set; 

supplying the raw data to a data mining module, wherein the data mining module forms 
clusters of related documents according to an unsupervised clustering procedure, and wher ein the 
&Ma mining module, upon receivin g the raw data, processes the raw data, independentl y of the 
u nsupervised clustering procedure, and prepares a single list of all of the documents v after 
elimin ating documents not reachable via the web and wherein the data mining module categorizes 
the documents so that each document is assigned to one of a predefined number of categories; and 
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displaying the documents in a format defined by the clusters, and in a format defined by the 
categories on a user interface so that a user can choose to browse the documents according to the 
cluster format or the categories format. 

32, (Original) The method of claim 31, further comprising: 
storing at least one of the raw data and the clusters; 

performing the steps of claim 3 1 to accomplish an additional search and data mining 
procedure; 

providing a browser including a relevance feedback mechanism; 

analyzing the documents displayed from the additional search as they arc browsed by a user 
on the user interface, wherein the analyzing includes comparing the documents being browsed with 
the stored data; and 

generating a relevance weighting factor based upon observations resulting from the 
analyzing. 

33. (Currently Amended) A computer system for searching both general and 
domain-specific information resources simultaneously pursuant to a user query and for 
obtaining organized search results therefrom, the system comprising: 

n metasearch engine capable of accessing a plurality of sites including generic, web-based 
search engines and domain-relevant search engines, for receiving documents from said plurality of 
sites in response to the user query; 

means for selcctiug particular search engines from a plurality of generic, web-based search 
engines and domain-relevant search engines that are presented to a user; 

means for displaying the received documents to the user; 

means for assembling the received documents from the plurality of sites searched by the 
selected particular search engines into a single lis t after eliminating do cuments not reachable via the 
web; 

means _for_assii»ntng.r.eleyancc-ran'k54o4heTeceived-documents in the single list and for 

organizing the documents in the single list according to said relevance ranks; 

means for clustering the received documents into clusters according to an unsupervised 
clustering procedure; 

and means for displaying said single list and said clusters to the user. 
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34. (Original) The computer system of claim 33 r wherein said means for assigning relevance 
ranks assigns the relevance rank based upon a frequency of occurrence of query terms in each of the 
received documents. 

35. (Original) The computer system of claim 33, further comprising: 

means for providing customized stop word lists to bo used with regard to said generic, web- 
based search engines and domain-relevant search engines, wherein said means for assigning 
relevance ranks references said slop word lists to strip stop words from documents associated with a 
respective engine for which the particular stop word list being referred to has been customized, prior 
to determining a frequency of terms that appear within each said document, and wherein said lenns 
are used to determine proximity scores between said documents. 

36. (Original) Hie computer system of claim 33, wherein said unsupervised clustering 
procedure performed by means for clustering employs a group-avcragc-linkagc technique to 
determine relative distances between documents. 

37. (Original) The computer system of claim 36, wherein said group-avcragc-linkagc 
technique employs the following algorithm for determining a proximity score that defines said 
relative distances between documents: 

Sij = 2 x (1 n _ NCr l5 Ti)/(N(Ti) + N(Tj)); 
where T\ is a term in document i; 
Tj is a term in document j; 

N(T| ? Tj) is the number of co-occurring terms that documents i and j have in common; 

N(Tj)is the number of terms found in document i; and 
N(Tj)is the number of terms in document j, 

38. (Original) The computer system of claim 33, further comprising: 

means for categorizing the received documents, so that each document is assigned to one of a 
predefined number of categories; and 

means for displaying said categories and said documents assigned thereLo to the user. 

39. (Original) The computer system of claim 38, further comprising means for storing a list 
of words for each of said categories wherein said words in each list are particular to the respective 
category, and wherein said means for categorizing compares the words in a particular list to a 

9 

PAGE 10/21 * RCVD AT 9/8/2004 1:30:10 PM [Eastern Daylight Time] * SVR:USPTO£FXRM/0 * DNiS:8729306 * CSID:650 327 3231 * DURATION (mm-ss):06-14 



SEP-08-Q4 WED 10:34 AM BOZICEVIC 



FAX NO. 650 327 3231 



P. 11 



Ally Dkt No.: 10010724-2 
USSN: 10/033,323 

document to bo characterized lo determine whether the document is classified in that particular 
category. 

40. (Original) The computer system of claim 38, further comprising means for providing a 
predefined list of words for each of the categories. 

41. (Original) The computer system of claim 38, further comprising means for automatically 
generating a word list for each of the categories. 

42. (Original) The computer system of claim 41, wherein said word lists arc prepared from a 
set of training documents. 

43. (Original) The computer system of claim 41, wherein each word automatically selected 
for the generation of the word lists is identified based on a function computed from a frequency of 
occurrence oT the word in the particular category for which it is selected, relative to a frequency of 
occurrence of the word in the other existing categories. 

44. (Original) The method of claim 41, wherein said word lists are prepared by incremental 
training using previously selected lists of words and corresponding categories, as well as user 
feedback regarding the categorization of at least one of the documents contained in at least one of the 
categories. 

45. (Original) The computer system of claim 33, further comprising: 
means for storing said received documents; and 

means for performing a sub-query inputted by a user to search for documents stored by said 
means for storing which are relevant to said sub-query; 

means for fetching raw data sub-query search results from said means for storing in the form 

^jC^Uilpc.umcjiii?; — 

means for displaying said raw data sub-query search results to the user; 
menns for assembling the raw data sub-query search results into a single list; 
means for assigning relevance ranks to the raw data sub-query search results and for 
organizing the results in the single list according to said relevance ranks; 
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means for clustering the received sub-query documents into clusters according to an 
unsupervised clustering procedure; 

and means for displaying said sub-query documents lo the user in the single list and clusters 
formats. 

46, (Original) The computer system of claim 33, further comprising: 

a browser including a relevance feedback mechanism, adapted to analyze ihe documents as 
they are browsed by a user on a user interface; and to generate a relevance weighting factor based 
upon observations resulting from tlie analysis, 

47, (Currently Amended) A computer system for searching both general and 
domain-specific information resources simultaneously pursuant to a user query and for 
obtaining organized search results therefrom, the system comprising: 

a mctasearch engine capable of accessing a plurality of sites including generic, web-based 
search engines and domain-relevant search engines for receiving documents from said plurality of 
sites in response to the user query; 

means for selecting particular search engines from a plurality of generic, web-based search 
engines and domain-relevant search engines that are presented to a user; 

means for clustering the received documents into clusters according to an unsupervised 
clustering procedure; 

means for preparing a single list of all of the documents, independently of said fo rming 
chisters^after eliminating dpcuments^nptji e.achable via the web: 

means for categorizing the received documents, so that each document is assigned to one of a 
predefined number of categories; and 

means for displaying said clusters, said categories and said documents assigned thereto to the 

user. 

— „ _ 48r~(fcWcntly' Amended)*The CO ~ " 

means for displaying the received documents to the user; 

means for assigning relevance ranks to the received documents in the single list and for 
organizing the documents in the single list according to said relevance ranks; 
means for storing said received documents; and 
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means for performing a sub-query inputted by a user to search for documents stored by said 
means for storing which are relevant to said sub-query; 

means for fetching raw data sub-query search results from said means for storing in the form 
of text documents; 

means for displaying said raw data sub-query search results to the user; 

means for assembling the raw data sub-query search results into a single list; 

means for assigning relevance ranks to the raw data sub-query search results and for 
organizing the results in the single list according to said relevance ranks; 

means for clustering the received sub-query documents into clusters according to an 
unsupervised clustering procedure; 

means for categorizing the received sub-query documents, so that each document is assigned 
to one of a predefined number of categories; and 

means for displaying said sub-query documents to the user in the single list, categories and 
clusters fonnats. 

49. (Original) The computer system of claim 47, further comprising: 

a browser including a relevance feedback mechanism, adapted to analyze the documents as 
they are browsed by a user on a user interface; and to generate a relevance weighting factor based 
upon observations resulting from the analysis. 

50. (Currently Amended) A computer readable medium carrying one or more sequences of 
instructions from a user of a computer system for searching both general and domain-specific 
information resources simultaneously to obtain organized search results therefrom, wherein 
execution of the one or more sequences of instructions by one or more processors causes the one or 
more processors to perfonn the steps of: 

receiving a query inputted by the user and receiving instructions as to which databases to 

access; 

acccssing sclcctcd sites using-gcneric7wc 

engines, based upon said instructions received from the user, and searching for documents on the 
selected sites, which are relevant to said query; 

fetching raw datii search results in the form of text documents from each of the selected sites; 

displaying said raw data on a user interface; 
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forming clusters of related documents form said raw data, according to An unsupervised 
clustering procedure and processing said raw data, independently of said unsupervised clusterin g 
procedure, and categori zing said documents so that each document is assipned to one of a prede fined 
number of categories ; and 

displaying said clusters of related documents on the user interface. 

51. (Original) The computer readable medium of claim 50, wherein the following further 
slops arc performed: 

preparing a single list of all of said documents, independently of said forming clusters, after 
eliminating documents not reachable via the web; and 

assigning simple relevance scores to said documents prepared in said single list, based upon a 
frequency of terms from said query that appear within each said document 

52. (Original) The computer readable medium of claim 5 1 , wherein the following further 
step is performed: 

providing customized stop word lists to be used with regard to said generic, web-based search 
engines, publication sites and sequences sites, and referencing said stop word lists to strip stop words 
from documents associated with a respective engine, publication site or sequence site for which the 
particular stop word list being referred to has been customized, prior lo detennming said frequency 
of terms that appear within each said document and using the terms to compute proximity scores 
between the documents for clustering the documents, 

53. (Cancelled) 

54. (Original) The computer readable medium of claim 50, wherein the following further 
steps are performed: 

providing a browser including a relevance feedback mechanism; 
— — aiialyzing41ie-doc-uments-as-they-ar«-browsed'by _ lhe'us'cr;"and~*" 

generating a relevance weighting factor based upon observations resulting from said 
analyzing. 
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